Skip to content

Pass train_dtype into _load_model_memory_efficient#106

Closed
RobotSail wants to merge 1 commit into
mainfrom
agent/mini-trainer-dev/2a5498f8
Closed

Pass train_dtype into _load_model_memory_efficient#106
RobotSail wants to merge 1 commit into
mainfrom
agent/mini-trainer-dev/2a5498f8

Conversation

@RobotSail

Copy link
Copy Markdown
Collaborator

Summary

  • Add explicit train_dtype parameter to _load_model_memory_efficient() instead of extracting it indirectly from base_kwargs["torch_dtype"]
  • Thread train_dtype through the full OSFT distributed loading path: setup_model()setup_osft_model_distributed()from_pretrained()_load_model_memory_efficient()
  • Maintain backward compatibility: falls back to torch_dtype from base_kwargs when train_dtype is not provided

Closes #34

Test plan

  • Added 2 new unit tests: one verifying train_dtype overrides base_kwargs["torch_dtype"], one verifying backward-compatible fallback
  • All 3 tests in TestLazyInitTokenizerAlignment pass
  • Full non-GPU test suite passes (307 passed, 2 skipped, 0 failures)
  • Ruff lint and format checks pass

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

The OSFT memory-efficient loading path was extracting the training dtype
indirectly from base_kwargs["torch_dtype"] instead of accepting it as an
explicit parameter. This made it fragile and inconsistent with the SFT
distributed loading path which takes train_dtype directly.

Thread train_dtype through: setup_model() → setup_osft_model_distributed()
→ from_pretrained() → _load_model_memory_efficient(), with backward-
compatible fallback to torch_dtype from base_kwargs when not provided.

Closes #34

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@RobotSail, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 4 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54651a2f-7a95-4e2c-a323-8efdb5a50d2c

📥 Commits

Reviewing files that changed from the base of the PR and between e3db6da and a7afa47.

📒 Files selected for processing (3)
  • src/mini_trainer/osft_utils.py
  • src/mini_trainer/setup_model_for_training.py
  • tests/test_osft.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch agent/mini-trainer-dev/2a5498f8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mini_trainer/osft_utils.py 50.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@RobotSail RobotSail closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pass train_dtype into _load_gpt_oss_model_memory_efficient

1 participant